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Welcome!! 


Topics to be covered 


e Welcome to the General Maths 3/4 head start lecture for 2024 ! 


House keeping: 
e Please feel free to utilise the chat to ask any questions 
e The slides should be able to be accessed below 


e The recording will be available after the lecture premier 
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Summary 


Overview & Tips 


20 minutes BLOCK 1: 
OVERVIEW & STUDY TIPS 


BLOCK 2: 


40 minutes 
UNIVARIATE DATA 
BLOCK 3: 
60 minutes BIVARIATE AND 


APPLICATION OF DATA 
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Univariate Data Bivariate Data Modelling Data 


Summary 
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Course Overview and Tips 


Overview & Tips 


| Data Analysis, Recursion and Financial 


SACS 
33% of 


Final 
Mark 


Exams 
66% of 


Final 


Mark 
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Overview & Tips 


Accreditation Period 


Updated — version 1.1 


Overview Overview & Tips Univariate Data 


Study Design Change (2023) 


Units 3 and 4: General Mathematics 


General Mathematics Units 3 and 4 focus on real-life application of mathematics and consist of the areas of 


Unit 3 comprises Data analysis and Recursion and financial modelling, and Unit 4 comprises Matrices and 
Networks and decision mathematics. 


Assumed knowledge and skills for General Mathematics Units 3 and 4 are contained in General Mathematics 
Units 1 and 2, and will be drawn on, as applicable, in the development of related content from the areas of 
study, and key knowledge and key skills for the outcomes of General Mathematics Units 3 and 4. 


In undertaking these units, students are expected to be able to apply techniques, routines and processes 
involving rational and real arithmetic, sets, lists, tables and matrices, diagrams, networks, algorithms, 

mental and by-hand approaches to estimation ‘and computation. The use of numerical, graphical, geometric, 
symbolic statistical and financial functionality of technology for teaching and learning mathematics, for 
working mathematically, and in related assessment, is to be incorporated throughout each unit as applicable. 


Area of Study 1 


Data analysis 


Students cover data types, representation and distribution of data, location, spread, association, correlation 
and causation, response and explanatory variables, linear regression, data transformation and goodness of 
fit, times series, seasonality, smoothing and prediction. 


Bivariate Data Modelling Data Summary 


Overview & Tips What changed for last year? 


1. No more modules choice 
e This means that now schools do not get to choose 2 of the 4 smaller modules and must all do 
Networks + Matrices 


2. Removal of non-casual effect and population statistics / sampling 
e This was low yield content in Data that has now been removed 
e Not many exam questions were asked on this 
3. Removal of simultaneous equations and representation of linear lines in matrices 
e This was super high yield content that had many exam questions asked about it 
e Important to ignore these questions when doing practice questions 


4. Addition of Leslie Matrix 


e We will cover this in a later lecture 


5. Lastly, the change of name, Further — General 
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Overview & Tips Calculator 


Calculator guides in textbook - copy them out into your notes in case you go 
blank in a SAC or exam! 


Your CAS is your best friend, make sure you know it well 


e For my trigonometry friends, you WILL love your CAS cause bearing are a thing and can be 
a massive pain — so please learn how to use it well! 


Shortcuts through menu screens (e.g. menu - 3 - 1 for the solve function) 


2 types of calculator, beware! - Ti nspire and the Casio Classpad, use the 
one your school makes you get! 
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Overview & Tips Textbooks and Study Guide 


Complete all textbook exercises - even if your teacher says only to do left 
hand side or something like that - do ALL the questions!!! 


e Do these THROUGHOUT the year, NOT at the end! 
e The Textbook questions are amazing for first grasping and building a basic understanding 
of concepts — it is however NOT useful in building your application and examination skills 


Review sections - Identify your weak points in a specific topic 
Cut things out of your textbooks and study guides for your Summary book if 
you find the topics helpful 


External study guides 
e ATAR Notes Course Guides/Topic Tests 
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Overview 


Overview & Tips Summary Book 


e There are two (maybe three) models for a Summary book that students 
follow: 


e Student A who puts literally everything he/she sees in their textbooks/study guides and 
class notes. At the end of the year their reference book is THICC but they have the peace of 


mind that everything's in there. 


e The negatives: 
e It will be a pain to look through during SACs or Exams and might waste valuable time. 
e Might consume a lot of your study time making it — if using this method, you NEED to stay 
up to date! 
e The positives: 
e You have the peace of mind that you have all the content and can use it when you get 
stuck. 
Creating a comprehensive summary book is a good tool to revise content 
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Overview & Tips Summary Book 


e The second type of summary book: 


e Student B who would put together a minimal summary book with only the essential 
formulas — usually just a couple of pages printed off from a summary sheet found online. 


e The positives: 

e You spend more time doing practice questions — SUPER IMPORTANT! 

e Don’t waste as much time in an Exam/SAC flicking through your reference book 
e The negatives: 


e You're on your own in an Exam/SAC if you get stuck — all you have are formulas and only 
the key pieces of information. 


e But the one golden rule is... you must create your OWN summary 
book! 
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ATARNotes 


Area of Study 1 
Data! 


ATARNotes 


1. Univariate Data 


Univariate Data Types of Data 


Univariate Bivariate 
2 


How many variables? 


“facts and statistics variables: 
collected together for TTT DATA —=z ba n sali to 
reference or analysis” collect data! 


What type of variables? 


Values / Measurements 


Numerical 
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Univariate Data Types of Variables 


Overview 


What type of variables? 


Values / Measurements 


Numerical 


e We can break down categorical and numerical 
variables into multiple categories! 
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Univariate Data Types of Variables 


Data is divided into categories 


S eg// Eye colour, football team N 


Categories have a natural order, but the interval 


i jes! 
No sensible way to sort the categories! between them is not specific. 


Eg// Hair colour: Eg// How satisfied are you? 
Does it make sense to order blonde, Very Satisfied 


brown, red and black hair? No! Somewhat Satisfied 
Neutral 


Somewhat Dissatisfied 
Very Dissatisfied 
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Univariate Data 


Types of Variables 


Numerical Data 


Data that you measure or count 


7 


Numerical Discrete 


Data you can count, can only take on a finite set 
of values. 


Eg// Number of people at this lecture! 

| could count all of you, and | would get a distinct 
number. Even if | wanted, | couldn’t get a more 
‘accurate’ value. 


Overview Overview & Tips Univariate Data 


Eg// Height, number of students 


Numerical Continuous 


Data that you measure, can take any value 
(infinite possibilities) 


Eg// How much does a $2 coin weight? 
e 7 grams 

e 6.6 grams 

e 6.60 grams 

e 6.601 grams 


Bivariate Data Modelling Data 


Univariate Data Types of Variables 


What type of variables? 


Values / Measurements 


E fegoric | Numerical 


I 
di 


No order Nei Count / \ Measure 
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Univariate Data Checklist for Variables 


Is the variable categorical or numerical? 


Can it be manipulated? 


YES i.e. does it make sense to NO 
find the mean, median, 


mode, range, multiply it, 
add it? 


Numerical Categorical 


lt makes sense to subtract two heights from each other, heights are 
numerical. 


It doesn’t make sense to subtract two eyes colours from each other, eyes 
colours are categorical. 
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Univariate Data Checklist for Variables 


Is the variable categorical or numerical? 


Can it be counted or measured? 


YES 4 NO 


Numerical Categorical 


Warning: Although numbers usually mean that a 
variable is numerical, it doesn't always! 
Categorical variables can contain numbers too! 
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Univariate Data Checklist for Variables 


Is the variable categorical or numerical? 


Is the number being used as a name? 


NO a “= 


Numerical Categorical 


For example, numbers on uniforms are used to identify players, they act as names, and 


therefore they are categorical. 
Post codes, house numbers and ratings on number scales (e.g. rate out of 5 stars) are other 


common categorical variables that use numbers! 
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Univariate Data PRACTICE QUESTIONS 


Assign the following variables as numerical, categorical, and whether they are 
discrete/continuous or ordinal/nominal: 


- How often do you study (often, sometimes, rarely) 
The temperature in degrees Celsius 
The cost to fill a car with a tank of petrol 
Shoe size (6, 8, 10) 
Colour of a pencil (red, green, blue) 
Floor levels in a building (1, 2, 3, 4) 


The number of pages in a book 
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Univariate Data PRACTICE QUESTIONS 


Assign the following variables as numerical, categorical, and whether they are 
discrete/continuous or ordinal/nominal: 
Categorical ordinal 


How often do you study (often, sometimes, rarely) 

Numerical continuous 
The temperature in degrees Celsius 
The cost to fill a car with a tank of petrol Numerical discrete 
Shoe size (6 8 10) Categorical ordinal 
Colour of a pencil (red, green, blue) Categorical nominal 


Floor levels in a building (1, 2, 3, 4) Categorical ordinal 


The number of pages in a book Numerical discrete 
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Univariate Data PRACTICE QUESTION 


Blood pressure 


low 


15 5 


Question 2 
The variables blood pressure (low, normal, high) and age (under 50 years, 50 years or over) are 
both nominal variables. 
both ordinal variables. 
a nominal variable and an ordinal variable respectively. 
an ordinal variable and a nominal variable respectively. 
a continuous variable and an ordinal variable respectively. 


VCAA — 2016 Further Math Exam 1 - Data Analysis - Question 2 
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Univariate Data PRACTICE QUESTION 


Question 2 
The variables SE bisa (low, normal, high) and age (under 50 years, 50 years or over) are 


= both — variables 
. a livimitrar-varidole and an ordinal variable respectively. 


D. an ordinal variable and a nominal variable respectively. 
E. a continuous variable and an ordinal variable respectively. 


VCAA — 2016 Further Math Exam 1 - Data Analysis - Question 2 
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Univariate Data Types of Data 


How many variables? 


DATA 


What type of variables? 


Values / Measurements 


Numerical 
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Univariate Data Univariate Data 


Univariate 


- One variable (uni meaning one) 


- Only one thing changes or is manipulated 


Colour Number O O 


Red 3 
Black 10 
White 8 
Silver 5 
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Bivariate Data Bivariate Data 


- Two Variables (bi meaning two) 


- There are two things that change or are manipulated 


J Walk | Bike |Car_ | Bus 


Year 7 5% 12% 71% 12% 
Year 8 7% 11% 68% 14% 
Year 9 9% 13% 63% 15% 
Year10 9% 17% 59% 15% 


- Bivariate data is super interesting, more on this later... 


Overview Overview & Tips Univariate Data Bivariate Data Modelling Data 


Univariate Data Displaying Univariate Data 


What type of graph do | use? (Univariate Data) 


Frequency tables 
Percentage frequency tables 


Bar charts 


Overview Overview & Tips Univariate Data Bivariate Data Modelling Data Summary 


Univariate Data Displaying Univariate Data 


What type of graph do | use? (Univariate Data) 
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Univariate Data Frequency Tables 


Provide the variable and the frequency 


Eg// Preferred social media platform? 


Social media platform 


Facebook 26 
Twitter 

Instagram 

Snapchat 
TikTok 
Total 
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Univariate Data Percentage Frequency Tables 


Frequency may also be displayed as percentage 
frequency 


Eg// Preferred social media platform? 


Social media platform % Frequency 


31.33% 
22.89% 
24.10% 
20.48% 
1.20% 
100% 


Facebook 
Twitter 
Instagram 
Snapchat 
TikTok 
Total 
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Overview Overview & Tips 


Univariate Data Bar Charts 


- Variable on the x-axis, frequency on the y-axis 
- Label axes, must rule lines 
- Bars of equal width with space between them 


Pets at home 
150 


100 


Frequency 


m Pets at home 
50 


Cats Dogs Birds Fish 
Type of pet 
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Univariate Data Describing Categorical Data 


Explain/Describe paragraphs pop up every now and then 
Have a template in your summaries 


When answering these kinds of questions, you must: 
Summarise the context 
Identify the mode (also called modal category, dominant category) 
Quote its frequency 
Quote other frequencies of interest 
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Univariate Data Example Question 


Q: Comment on the data shown in the frequency table below. 
Hot 6 26.1% 
Mild 60.9% 
Cold 13.0% 
Total 23 100 


The climate types of 23 countries were classified as being “cold”, 

“mild” or “hot”. The majority of the countries, 60.9%, were found to 
have a mild climate. Of the remaining countries, 26.1% were found 
to have a hot climate, while 13% were found to have a cold climate. 
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Univariate Data Example Question 


Hot 6 26.1% 
Mild 14 60.9% 
Cold 3 13.0% 
Total 23 100 


Context 
The climate types of 23 countries were classified as being “cold”, 
“mild” or “hot”. The majority of the countries, 60.9%, were found to 
have a mild climate. Of the remaining countries, 26.1% were found 
to have a hot climate, while 13% were found to have a cold climate. 
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Univariate Data Example Question 


Hot 6 26.1% 
Mild 14 60.9% 
Cold 3 13.0% 
Total 23 100 


Mode 


The climate types of 23 countries were classified as being “cold”, 

“mild” or “hot”. The majority of the countries, 60.9%, were found to 
nave a mild climate. Of the remaining countries, 26.1% were found 
to have a hot climate, while 13% were found to have a cold climate. 
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Univariate Data Example Question 


Hot 26.1% 
Mild 60.9% 
Cold 13.0% 
Total 100 


Note other frequencies 
The climate types of 23 countries were classified as being “cold”, 
“mild” or “hot”. The majority of the countries, 60.9%, were found to 
have a mild climate. Of the remaining countries, 26.1% were found 
to have a hot climate, while 13% were found to have a cold climate. 
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Univariate Data FREQUENCY TABLES 


- Variable and frequency displayed 
- Frequency can be real or percentage frequency 


- Data with a large spread may be displayed as “grouped” data 


10 
13 
2 


18 
43 
25 
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Univariate Data DOT PLOTS 


e Used for discrete numerical data 


e Only the x-axis is used, displaying the variable 


e Number of dots represents frequency 
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Univariate Data 


Common question: 


- Finding the median 
- Count the total 
- Find the dot denoting the middle point of the data 
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DOT PLOTS 


Numerical 


Modelling Data 


Univariate Data 


Common question: 


- Finding the median 
- Count the total 
- Find the dot denoting the middle point of the data 
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DOT PLOTS 


Numerical 


Modelling Data 


Univariate Data STEM AND LEAF PLOTS 


* There are two basic parts: 


e Stem: the first digit(s) 
e Leaf: the last digit(s) 


e For example, the number 41 may be shown as follows: 


Stem Leaf 
(essentially representing 40) (essentially representing 1) 
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Univariate Data Stem and Leaf Plots 


Warning: Always include a key/legend with 


your stem plot. 
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Univariate Data Stem and Leaf Plots 


Warning: Always include a key/legend with your 


stem plot. 


Key: 


1|0= 100 
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Univariate Data 


Stem and Leaf Ploys 


Warning: Always include a key/legend with your stem 
plot. 


- The leaf is ordered 


- Can split the stem up in half or fifths if plot is 
bunched 


Stem Leaf 


Overview 
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Univariate Data Histograms 


Similar to bar charts, but no space between columns 
Variable on x-axis, frequency on y-axis 


Useful to identify key features of data that we use in descriptions 
(Shape, spread, centre, outliers) 


Football game attendance 
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Log Scales 


Univariate Data 
e Sometimes, the data’s range is too large to display on a regular 


histogram 
In these cases, we use log scale histograms as a solution! 


e wth is a log scale? 


Normal scale: Constant addition between marks 
+10 30 +10 40 


0 +10 10 +10 20 


Log scale: Constant multiplication between marks 
eee se Pee ;\(\(\(_(gmAx;A 
x10 1000 x10 10,000 


1 x10 10 x10 100 
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Overview 


Univariate Data Log Scales 


Properties of logs: 
If a number is greater than 1 its log is greater than O 
If a number is greater than O but less than 1 its log is negative 


If a number is O its log is undefined, and you can't have logs of 
negative numbers! 


Warning: When displaying logs on an axis we only use their order of 
magnitude (102 becomes 2), though we must label the axis as 
log(variable). 
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Univariate Data Log Scale Histogram 


Eg// | 


Frequency 


Log(variable) 
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Univariate Data Log Scale Guide 


A handy log guide: 
0.01 0.1 


1 10 100 1000 10000 100000 1000000 
10- 10° 10° 10° 10° 10° 10° 


10° 10° 


Overview 
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Univariate Data Converting Logs 


- To find the log of a number 
- Eg. What is the log of 150? 


- log (150) = 2.176 


Use calculator!! 


- To find the number of a log 


- Eg. Find the number of log 1.683 


101-683 = 48,1948 


Overview Overview & Tips Univariate Data Bivariate Data Modelling Data Summary 


Univariate Data PRACTICE QUESTION 


Question 7 


The histogram below shows the distribution of the number of billionaires per million people for the same 
53 countries as in Question 6, but this time plotted on a log,, scale. 


30 


—l 0 1 
log, (number) 


Data: Gapminder 


Based on this histogram, the number of countries with one or more billionaires per million people is 
1 
3 
8 
9 
10 


VCAA 2016 Exam 1 — Question 7 
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Univariate Data PRACTICE QUESTION 


Question 7 


The histogram below shows the distribution of the number of billionaires per million people for the same 
53 countries as in Question 6, but this time plotted on a log,, scale. 


30 


—l 0 1 
1080 (number) 


Data: Gapminder 


Based on this histogram, the number of countries with one or more billionaires per million people is 
A. 1 


B 
C 


3 
8 


VCAA 2016 Exam 1 — Question 7 
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Univariate Data 5 Figure Summary 


Includes: 


The minimum value 

The value of quartile 1 (Q1) 
The median 

The value of quartile (Q3) 
The maximum value 


We can work this out by hand or on the calculator, depending on 
what set of data you have either one may be quicker! 
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Univariate Data 5 Figure Summary - Example 


How can we find the 5 number summary by hand? 


2 5 7 8 9 13 14 15 16 20 21 25 37 41 
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Univariate Data 5 Figure Summary - Example 


How can we find the 5 number summary by hand? 


25 7 8 9 13 14 15 16 20 21 25 37 41 


We can see that we have 14 numbers here 
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Univariate Data 5 Figure Summary - Example 


How can we find the 5 number summary by hand? 


25 7 8 9 13 14/15 16 20 21 25 37 41 


We can see that we have 14 numbers here 


Median: 14.5 
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Univariate Data 5 Figure Summary - Example 


How can we find the 5 number summary by hand? 


25 7 (8 9 13 14| 15 16 20 21 25 37 41 


Q1:8 
Median: 14.5 
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Univariate Data 5 Figure Summary - Example 


How can we find the 5 number summary by hand? 


25 7 8)9 13 14 15 16 20 2) 25 37 41 


Q1:8 
Median: 14.5 
03:21 
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Univariate Data 5 Figure Summary - Example 


How can we find the 5 number summary by hand? 


25 7 8)9 13 14 |15 16 20 (21 25 37 41 


Minimum: 2 
Q1:8 
Median: 14.5 
Q3: 21 

Max: 41 
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Univariate Data 5 Figure Summary - Dot Plots 


How can we find the 5 number summary from a dot plot? 
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Univariate Data 5 Figure Summary - Stem and Leaf 


How can we find the 5 number summary from a stem plot? 


Stem Leaf 


0|2 


2(3)8 9 


9|9 


4 4(7) 


0 1/9 
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Univariate Data interquartile Range 


What can we do with these details? 


- IQR 
- IQR=Q3-Q1 


- Outlier calculations - Extremely common exam question! 


- Lower fence value 
Q1-1.5 x IQR 


- Upper fence value 
Q3 + 1.5 x IQR 
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Univariate Data 5 Figure Summary - Box Plots 


Visual display of the 5 number summary 


123456789 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
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Univariate Data 5 Figure Summary -Box Plots 


Visual display of the 5 number summary 


Q1 Median 


| 


1234567 89 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
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Univariate Data 5 Figure Summary - Box Plots 


Visual display of the 5 number summary 


Q1 Median 


| 


Outlier Outlier 


1234567 89 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
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Univariate Data 5 Figure Summary - Box Plots 


Visual display of the 5 number summary 


Q1 Median 


| 


Outlier Outlier 


1234567 89 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
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Univariate Data Describing Numerical Data 


Must discuss: shape, centre, spread and outliers 


Outliers: Are there any present? If so, what are they? Also note if 
there are no outliers. 


Centre: Note the mean or the median, perhaps both 
e Mean, median, mode 


Spread: What is the range of the data 
e IQR, range, standard deviation 


Univariate Data Bivariate Data Modelling Data Summary 


Overview Overview & Tips 


Univariate Data Shape of Data 


Shape: 


Positively Skewed Negatively Skewed 


"a eI 


Approximately symmetrical Bimodal 
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Univariate Data Box Plot Shape 


Once again, we look at outliers, centre, soread and shape 


For box plots, shape is displayed as: 


(t+ {I 


1234567 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 1234567 8 9 10111213 14 15 16 17 18 19 20 21 22 23 24 25 26 
Positively Skewed Negatively Skewed 


7} 


1234567 89 10111213 14 15 16 17 18 19 20 21 22 23 24 25 26 
Approximately Symmetrical 
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Univariate Data Normal Distribution 


mean = median = mode 


Symmetrical 


Approaches zero on 
both sides 
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Univariate Data Standard Deviation 


Describes the spread of data values around the mean 


e.g.mean=5 S,=2 
- 15.D. above mean = 7 
- 25S.D. above mean =9 


- 15.D. below mean = 3 
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Summary 


Univariate Data Standard Deviation 


Around 68% of the data values lie within one standard deviation 
of the mean. 


Around 95% of the data values lie within two standard deviations 
of the mean. 


Around 99.7% of the data values lie within three standard 
deviations of the mean. 
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Univariate Data PRACTICE QUESTION 


Q1. A class of 24 students receives their science test results, with a 
mean of 32 and a standard deviation of 2. 
How many students received a score between 28 and 32? 
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Univariate Data PRACTICE QUESTION 


Q1. A class of 24 students receives their science test results, with a 


mean of 32 and a standard deviation of 2. 
How many students received a score between 28 and 32? 


32-28 = 4 

4/2=2 

2 std deviations under the mean 
34% + 13.5% = 47.5% 


24x0.475=11.4 
11 students 
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Summary 


Univariate Data Z-Score 


z = z-score 
x = actual score 
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Univariate Data PRACTICE QUESTION 


Q1. A class of 24 students receives their science test results, with a 
mean of 32 and a standard deviation of 2. 
How many students received a score between 28 and 34? 


Q2. Ben achieved a result of 35, what is his standardised score? 
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Univariate Data PRACTICE QUESTION 


Q1. A class of 24 students receives their science test results, with a 
mean of 32 and a standard deviation of 2. 
How many students received a score between 28 and 34? 


Q2. Ben achieved a result of 35, what is his standardised score? 


Answers: 


Q1. 81.5% 
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Univariate Data 


DATA 


Bivariate Numerical 


Frequency tables 


i Percentage frequency tables 


Frequency tables 


Dot/Box plots 
Numerical Stem and leaf plots 


5 Figure Summary 68-95-99.7% rule 
Shape, centres, spread zZz - scores 
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Types of data 


Displaying data 


Analysing data 


Bivariate Data 


Modelling Data 


ATARNotes 


2. Bivariate Data 


Bivariate Data Why Bivariate Data? 


Univariate data is great at telling us the what 


e What is the average height of people in this room? 
What is the most popular colour? 


What is the average temperature in Melbourne? 


Bivariate data allows us to compare data, and focus on the why 


What is the relationship between age and height? 
Does gender play a role in someone’s favourite colour? 


How do the average temperatures in all major Australian cities 
compare? 
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Bivariate Data Explanatory vs Response Variables 


When we've got more than one variable, we give the variables 
different names. 


Science kids, you’ll know these as independent and dependent 
variables. 


Here, we call them explanatory and response variables. 
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Bivariate Data Explanatory vs Response Variables 


Explanatory Response 
variable variable 


Also known as IV or EV 

The variable that you expect when 
changed, will ‘explain’ to some extent 
the change in another variable. 
Plotted on x axis 


Also known as DV or RV 

The variable that you think will be 
changed ‘as a response’ to a 
changing EV. 

Plotted on y axis. 


Age Shoe size 
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Bivariate Data Displaying Bivariate Data 


Type of Data 
Graph 


Explanatory Response 
Variable Variable 
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Bivariate Data Displaying Bivariate Data 


Type of Data 
Explanatory Response Gr aph 
Variable Variable 
Categorical Categorical 


Numerical Categorical 
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Bivariate Data Displaying Bivariate Data 


Type of Data 
Explanatory Response Gr aph 
Variable Variable 
Categorical Categorical 


Numerical Categorical 
Numerical Categorical Back to Back Stem and Leaf 
2 categories Plots 
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Bivariate Data Displaying Bivariate Data 


Graph 


Explanatory Response 
Variable Variable 


i . S ted Bar Chart or T 
Categorical Categorical 


Numerical Categorical 

Numerical Categori cal Back to Back Stem and Leaf 
2 categories Plots 

Numerical Numerical 
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Bivariate Data Segmented Bar Charts 


- Variable on x-axis, frequency on y-axis 


- Can be in terms of raw number or percentage 


Number of days at temperature levels 


100% 
90% 
80% 
70% 
60% 

% frequency 500) 
40% 

30% 

20% 

10% 

0% 


2010 2011 


Year 
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Bivariate Data SEGMENTED BAR CHARTS 


Note: Ensure to include key! 


Number of days at temperature levels 

100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 


% frequency 
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Bivariate Data Two Way Frequency Tables 


Two-Way Frequency Table incerta) [Mee 
- RV = Rows 


- EV = Columns 


Attitude 


11 
For 36% 
Against 64% 
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Bivariate Data Two Way Frequency Tables 


What can we see from this? I E 
There seems to be an association 


If it was random, we would expect percentages to be around 
50/50, but they’re not! 


Attitude 


11 
For 36% 
Against 64% 
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Bivariate Data Back to Back Stem and Leaf Plots 


Numerical || Categorical 
Back-to-back stem and leaf plots iii 


Blue eyes Brown eyes 


469 


2 
35 
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Bivariate Data Parallel Dot Plots 


- One categorical variable, one numerical variables 


- Allows for easy comparison of distributions’ 
shape 


UW 
ŞS 
A 
> 
IT] 
> 
> 
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Bivariate Data Comparing Distribution 


Can be asked to compare data sets looking at graphs 


Box Plots 

Histograms 

Dot Plots 

Back to Back Stem and Leaf Plots 


We look at: 
- Centre 


- Spread 
- Shape 
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Bivariate Data 


Comparing Box Plots 


— | 


1234567 8 9 1011 1213 14 15 16 17 18 19 20 21 22 23 24 25 26 


Score 
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Comparing Distribution - Example 


The distributions of boys scores on the 
test are negatively skewed, whilst the 
girls’ score distribution is positively 
skewed. There are no outliers. The 
median score for boys is higher (M = 23) 
than for girls (M= 9.5). This IQR is 
smaller for boys (IQR = 10) than for girls 
(IQR = 12). The range of scores for boys 
and girls is equal (Range = 19). 


Bivariate Data Modelling Data Summary 


Bivariate Data Scatterplots 


Numerical Numerical 


- Used HEAPS in the real world, Super useful! 


- Make sure you know how to plot these on your calculator. 
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Bivariate Data Scatterplots 


Numerical Numerical 


When describing scatterplots we MUST mention three things: 


1. Strength 
2. Direction 
3. Form 
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Bivariate Data Pearson’s Correlation Coefficent 


Otherwise known as the r value 


Measures the strength of a linear relationship 


We generally assume that a linear relationship is present (in some 
cases it Isn't, but we'll get to that) 


Always find the value of r using your calculator, can’t do it by 
hand. 
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Bivariate Data Pearson Correlation Coefficient 


Strong positive: r= 0.75 to 0.99 
Moderate positive: r=0.5 to 0.74 

Weak positive: r= 0.25 to 0.49 

No association: r = -0.24 to 0.24 
Weak negative: r = -0.25 to -0.49 
Moderate negative: r= -0.5 to -0.74 
Strong negative: r= -0.75 to -0.99 


e Size of r value > STRENGTH of the association 
e Sign in front of r value > DIRECTION of the association 


Warning: Can only calculate r values for linear data sets 
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Bivariate Data Direction 


e Positive or negative 


e Adirection suggests that there is an association 
between two variables 


4 6 4 È 
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Bivariate Data 


e Linear, non-linear or no association 


Linear: Data follows a relatively 
straight line. 

Non linear: Data does not occur ina 
straight line patter, but does follow a 


positive linear negative linear curved pattern. 


association association 
No association: Data points are 


randomly spread and do not appear to 
be associated. 


nonlinear no association 
association 
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Bivariate Data Interpreting r 


e If asked to interpret the value of the correlation 
coefficient use the following template 
sentences. Be sure to put these in your bound 


- 


Linear, positive and Linear, positive and Linear, positive and 

strong moderate weak 

It can be concluded that the y | There is some evidence to There is limited evidence to 

variable should increase as suggest that the y variable suggest that the y variable 

the x variable increases. should increase as the x should increase as the x 
variable increases. variable increases. 


Linear, negative and Linear, negative and Linear, negative and 

strong moderate weak 

It can be concluded that the y | There is some evidence to There is limited evidence to 

variable should decrease as | suggest that the y variable suggest that the y variable 

the x variable increases. should decrease as the x should decrease as the x 
variable increases. variable increases. 
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Bivariate Data Summary 


Strong positive: r=0.75to 0.99 


Moderate positive: r=0.5 to 0.74 
Weak positive: r=0.25 to 0.49 


: Segmented Bar Chart — 
Categorical Categorical or Two Way No association: r=-0.24 to 0.24 


Graph 


Explanatory Response 
Variable Variable 


Frequency Table 


Weak negative: r=-0.25 to -0.49 


Categorical Parallel Box Plots Moderate negative: r = -0.5 to -0.74 
È E: Back to Back Stem Strong negative: = -0.75 to -0.99 
Categorical and Leaf Plots 


Displaying Bivariate Data Describing bivariate distributions: SDF 
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ATARNotes 


3. Modelling Data 


Modelling Data Modelling Data 


Bivariate data, and in particular bivariate data with two numerical 
variables, is extremely useful! 


This is because we can use It to construct models, mathematical 
equations that allow us to predict the values of data points we 
didn’t even measure. 
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Modelling Data Least Square’s Line of Best Fit 


- How do we come up with a line of best fit?!?! 


Few things: 


- We take the residual = vertical distance between actual data point and line of best fit. 
- We then make sure our line of best fit line minimises the sum of the squares of residuals 


- Works best if there are no outliers 
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Modelling Data Line of Best Fit 


y = a + bx 


When we have a scatter-plot, we can find a linear equation that 
allows us to predict y from x. 


x is the explanatory variable 
y is the response variable 
Must ensure you are entering these variable in the correct order! 


Classic VCAA trick to give you the y variable before the x. 
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Modelling Data Forming the Line of Best Fit 


e General form of a least squares line of best fit is: 
y=a+t+bx 


e Two scenarios: 
1. If you're given the raw data, use your CAS! 
2. If you're given statistics, use the following formulas! 


Where: 
e The slope/gradient of the line is b = rx z 


e The y intercept of the line isa = y — bx 

And: 
r is the Pearson correlation coefficient 
Sy and s, are the sample standard deviations of y and x respectively. 


x and y are the sample means of x and y 
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Modelling Data Example 


aan 160cm 163cm 165cm 169cm 174cm 180cm 185cm 191cm 
"Werane) 6GO0kg 63kg 70kg 6©/kg 72kg 75kg 71kg 77 kg 


Find the regression equation used to calculate height from weight 


Put data in the CAS (lists and spreadsheet) 

Place the height on the x axis and weight on y axis 
Find regression equation 

Weight = 58.022 + 1.63 x Height 
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Modelling Data Interpreting the Line of Best Fit 


e We can interpret the regression line, y = a + bx by saying: 


e The y intercept is a. This means the y variable is a units when the 
x variable is zero units. 


e The slope is b. This means that the y variable increases/decreases by 
b units for every 1 unit increase in the x variable. 


* Use the word ‘increases’ when b is positive, and ‘decreases’ when b is 
negative. 


e Replace everything in red to fit the context of the question. 
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Modelling Data Coefficient of Determination 


- r? > Used where we can reasonably believe there is causation 
- Tells us the extent to which x caused y 


Explanation: The coefficient of determination tells us that r2x100% of 
the variation in the response variable is explained by the variation in 
the explanatory variable 


Warning: r? will always come out of the calculator positive. 
We can tell if it is truly positive or negative by observing the 


scatterplot or gradient 


Overview Overview & Tips Univariate Data Bivariate Data Modelling Data Summary 


Modelling Data Extra- and Interpolation 


How reliable are these predictions? 
e “Assess the validity” 


Interpolation: The x value you are predicting from is within the 
data set (a fairly reliable prediction) 


Extrapolation: The x value you are predicting from is outside of the 
data set (an unreliable prediction) 
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Modelling Data Residuals 


e How can we mathematically check if a scatterplot is linear or not? 
e We plot a residual plot! 


e To find the residual of a specific point, the equation is: Residual = 
Actual y value — Predicted y value 
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Modelling Data 


Overview 


Non — linear 


Linear 


Overview & Tips 


RESIDUAL PLOTS 


Clear Patterns 
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Predicted Predicted 


Residuals 


» a ® » » - “ » » 


Predicted 
Every residual is zero Residuals randomly scattered 
close to the x-axis 


Predictor 


Univariate Data Bivariate Data Modelling Data 


Modelling Data Data Transformation 


The circle of transformations 
Possible transformations Possible transformations 


Essential Further Mathematics Units 3 & 4, 4th Edition, pg 190. 
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Modelling Data 


Applying a x? transformation 


45 10 16 26 
2 3 4 9 
9 16 25 


Therefore, we would graph and find the recurrence equation of 


Mi 245 10 16 26 
Pom. 4 9 16 25 
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Summary 


Modelling Data 


R2 = 0.9979 ,.® 


Pre transformation, non —linear Post — transformation 
+ Data has been linearised 
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Modelling Data 


The circle of transformations 
Possible transformations Possible transformations 
o 
o 00 


10 20 30 40 50 60 
Essential Further Mathematics Units 3 & 4, 4th Edition, pg 190. 


Interpreting and calculating lines of best fit Transformations 


2 


Residual r 
plots 


Standardized Residual 


Correlation coefficient 


2 
» 2 3 » » 


LJ ‘s » 
Predicted 
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GOOD LUCK <3 


Woosh. You will pass 
your exams. 


ATARNotes 


4. Time Series 
(Extra content completing U3 AOS1, 
will not be covered in recording) 


Time Series Time Series 


The same as a regular scatterplot, except: 
- Our explanatory (x) variable is time 


- We connect the data points with lines 


Sea Level Rise Over Time 


N 
ol 


N 
(©) 


Sea Level Rise (cm) 
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Summary 


Time Series 


e Describes what is happening in the long term 


- Increasing Trend: Present where there is a positive slope 
- Decreasing Trend: Present where there is a negative slope 


GDP of Australia 
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Summary 


Time Series Seasonality 


Peaks/troughs at regular intervals related to the calendar (usually 
seasons of the year, but could be weekly, monthly etc.) 
Typically similar in size 


Some consistency to the peaks/troughs 
Money Spent at Department Stores in NSW 


1982 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 OO DI 
Original 


http://www.abs.gov.au/websitedbs/D3310114.nsf/home/Time+Seriest+Analysis:+The+Basics 


Overview Overview & Tips Univariate Data Bivariate Data Modelling Data Time Series Summary 


Time Series 


Long terms variations that are not seasonal 


Cycles do NOT exist within a year, they are only present in time 
series extending more than a year 


Seasonality can exist within cycles 


È 


5000 


Number of lynx trapped 


0 1000 3000 
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Time Series Irregular Fluctations 


These are present in basically every times series we look at 


Any data point that cannot be attributed to cycles, seasonality, 
trends or structural change are classified as irregular 


Basically, if ever there is a data point that isn’t perfectly in place, 
we Say there are irregular fluctuations (this was every time series | 
came across) 


If you've got no idea just guess this lol 
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Time Series Structural Change 


- Where there is a sudden change in the established pattern of a 
time series plot 


- Must be a marked change that is then continued in subsequent 
data 


Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
Month 
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Time Series Outliers 


Individual data points that stand out from the general body of 
data 


Generally caused by a one off event, common example is a 
financial crisis 


Time Series Plot of Weight 
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Time Series Smoothing Time Series 


Most of the time, time series plots look pretty messy, and this 
makes them hard to read. 


To make the trends on the series a little easier to identify, we use 
a process called smoothing. 


Two methods: 
1. Moving-mean smoothing (numerical) 


2. Moving-mecian smoothing (graphical) 
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Time Series Moving Mean Smoothing 


e Dilutes the effect of large fluctuations 
Basically takes into account the surrounding data to each point to 
give a Clearer trend flow 


y 


7 

13 (7+13+6)/3 8.67 
(13+6+14)/3 11 
(6+14+6.5)/3 8.83 
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Time Series Moving Mean Smoothing 


e Slightly more complicated for even number smoothing 


1 T 


13 
7+13+6+14=40/4=10 
(10 + 9.88)/2 = 9.94 
13 +6 + 14 + 6.5 = 39.5 / 4 = 9.88 


Overview Overview & Tips Univariate Data Bivariate Data Modelling Data Time Series 


Time Series Moving Median Smoothing 


e Uses the graphical representation of data points to find a 
smoothed line 


Tip: Double check your answers! Math by sight leaves you open to 
errors 
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Time Series Moving Median Smoothing 


Question 12 


The time series plot below charts the number of calls per year to a computer help centre over a 10-year 
period. 


2015 VCAA Exam 1 
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Time Series Seasonal Indices 


Another way that we can help make data more easily readable is 
through the use of seasonal indices. 


Sometimes, we want to compare and make regression lines (Say 
perhaps, sales figures for a business) for time series data. 


However, seasonality can make it hard to accurately get a linear 
relationship. 


To overcome the effects of seasonality, we can deseasonalise our 
data using seasonal indices. 
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Time Series Seasonal Indicies 


value of season 


seasonal index = ——T____ 
seasonal average 


If we add all of the seasonal indices in a data set we are given the 
number of seasons (generally this is 4 as most data looks at a 
whole year) 


Note: season can mean various things: 
- Month 

- Quarter 

- Weather Seasons 
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Time Series Interpreting Seasonal Indicies 


Interpreting: 


A seasonal index of 1.3 during summer tells us that figures for 
summer are 30% above average 


A seasonal index of 0.87 during winter tells us that figures for winter 
are 13% below average 
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Time Series Correcting with Seasonal Indices 


Correcting: 


To correct for seasonality, our formula is ———————_ 
seasonal index 


Warning: People always screw this up, make sure you get it! 
Eg. January’s seasonal index is 0.8 


To correct for Seasonality, we should increase the figures for January 
by 25% because 1/0.8=1.25 (125%) 
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Time Series Deseasonalisation 


- Seasonal variation complicates regression, So we deseasonalise 
data before fitting a line 


- Predictions must however take into account seasonal variation 


- $0, we reseasonalise data predicted from our equation 
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Time Series Deseasonalised Values 


- To deseasonalise: 


p actual value 
- Deseasonalised value = ——_ _—__ 
seasonal index 


- To reseasonalise predicted data: 


- Actual value = deseasonalised value x seasonal index 
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Time Series Making Forecasts 


. Calculate seasonal indices 
. Deseasonalise the response variable 
. Fit a line to the deseasonalised data 


. Make predictions by substituting in values for time to get a 
deseasonalised prediction 


. Reseasonalise this prediction to get the actual prediction 
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Time Series Example 


Sales 120 93 65 108 
Sl 


1. Calculate seasonal indices 


120 + 93 + 65 + 108 = 386 
386 / 4 = 96.5 (Seasonal average) 


Summer ‘11 SI = 120 / 96.5 = 1.24 
Autumn ‘11 SI = 93 / 96.5 = 0.96 


(checking all answers add to 4) 
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Time Series Example 


Sales 120 93 65 108 
Sl 


2. Deseasonalise the response variable 


120/1.24 = 96.77 
93/0.96 = 96.88 
65/0.67 = 97.01 
108/1.12 = 96.43 
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Time Series 


93 65 108 


Sales 120 
Sl 1.24 0.96 0.67 1.12 
Deseas Sales 96.77 96.88 97.01 96.43 


3. Fit a line to the deseasonalised data 


- Tofita line, you must number each point on the x axis 
- Calculate this line the same way you would a line of best fit 


Deseasonalised sales = 97 - 0.089 x quarter number 
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Summary 


Time Series EXAMPLE 
SSS SS Ee Ee ee 
93 65 108 


Sales 120 
Sl 1.24 0.96 0.67 1.12 
Deseas Sales 96.77 96.88 97.01 96.43 


4. Make predictions using regression line 


Predict the deseasonalised sales for Spring ‘12: 


97 - 0.089 x 8 = 96.23 
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Time Series EXAMPLE 
ENNS ea 
93 65 108 


Sales 120 
SI 1.24 0.96 0.67 1.12 
Deseas Sales 96.77 96.88 97.01 96.43 


5. Reseasonalise to get actual prediction 
96.23 x 1.12 = 107.78 


(deseasonalised value x seasonal index = Actual value) 
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Time Series Summary 


Smoothed y 
1 7 


2 13 (7+13+6)/3 8.67 
3 (13+6+14)/3 11 
4 14 (6+14+6.5)/3 8.83 
5 6.5 


Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
Month 

Trends 

Seasonality Numerical and graphical 

Cycles smoothing 

Irregular fluctuations 


Structural changes 


_AV 


DV = —— 
SI 


Seasonal indices and deseasonalisation Forecasting 
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